TITLE OF THE INVENTION: 

SYSTEM AND METHOD FOR ESTIMATING TRANSACTION COSTS RELATED TO 
TRADING A SECURITY 

RELATED CASES 

[0001] This application is based on and claims priority to provisional patent 
application number 60/464,962 filed on April 24, 2003, the entire contents of which 
are incorporated herein by reference. 

BACKGROUND OF THE INVENTION 

[0002] The performance of an investment is strongly related to execution costs 
related to the investment. Often with trading securities, transaction costs may be 
large enough to substantially reduce or even eliminate the return of an investment 
strategy. Therefore, achieving the most efficient order execution is a top priority for 
investment management firms around the globe. Moreover, the recent demand of 
some legislators and fund shareholder advocates of greater disclosure of 
commissions and other trading costs makes their importance even more pronounced 
(see, for example, Teitelbaum [14]). Therefore, understanding the determinants of 
transaction costs and measuring and estimating them are imperative. For further 
discussion see, for example, Domowitz, Glen and Madhavan [5] and Schwartz and 
Steil [13]. 

[0003] Traditionally, there appear to be two different approaches for estimating 
trading costs. The first approach is purely analytical and emphasizes 
mathematical/statistical models to forecast transaction costs. Typically, these models 
are based on theoretical factors/determinants of transaction costs and take into 
account, for instance, trade size and side, stock-specific characteristics (e.g., market 
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cap, average daily trading volume, price, volatility, spread, bid/ask size, etc.), market 
and stock-specific momentum, trading strategy, and the type of the order (market, 
limit, cross, etc.). 

[0004] The modeling is focused primarily on price impact and, sometimes, 
opportunity cost. For example, Chan and Lakonishok [4] report that institutional 
trading impact and trading cost are related to firm capitalization, relative decision 
size, identity of the management firm behind the trade and the degree of demand for 
immediacy. Keim and Madhavan [9] focus on institutional style and its impact on 
their trading costs. They show that trading costs increase with trading difficulty and 
depend on factors like investment styles, order submission strategies and exchange 
listing. Breen, Hodrick and Korajczyk [2] define price impact as the relative change in 
a firm's stock price associated with its observed net trading volume. They study the 
relation between this measure of price impact and a set of predetermined firm 
characteristics. Typically, some of these factors are then selected and implemented 
in mathematical or econometrical models that provide transaction cost estimates 
depending on different trade characteristics and investment style. ITG ACE™ 
(Agency Cost Estimator), described in [7] is an example of an 
econometric/mathematical model that is based on such theoretical determinants. It 
measures execution costs using the implementation shortfall approach discussed in 
Perold [12]. See also [15] and [16] for other examples of this type of model. 
[0005] While the first approach implicitly assumes that past execution costs do 
not entirely reflect future costs, the second approach is specifically based on this 
principle. In the second approach, the focus is exclusively on the analysis of actual 
execution data, and resulting estimates are used primarily for post-trade analysis. 
Typically, executions are subdivided into segments called peer groups, then simple 
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average estimates of transaction costs in each segment are built. Taking empirical 
averages, however, might cause problems. For example, very often cells with 
insufficient amount of data may provide inaccurate and inconsistent estimates due to 
just several outliers. 

[0006] The present invention incorporates ideas of both approaches above to 
provide an improved method for estimating transaction costs. 

SUMMARY OF THE INVENTION 

[0007] According to the present invention, a method is provided for estimating 
transaction costs for financial transaction - preferably equity trades. Estimates are 
built using historical execution data, which is split into different peer groups. 
However, instead of calculating simple average estimates, a more sophisticated 
methodology is applied to historical execution data to produce more robust and 
consistent forecasts. 

[0008] According to an embodiment of the present invention, a method is 
provided for creating a peer group database, which includes a step of collecting 
security transaction data for a preselected period of time, for a plurality of investment 
institutions. The transaction data includes identity of securities being traded, 
transaction order sizes, execution prices and execution times. The transaction data 
is grouped into a plurality of orders. A plurality of cost benchmarks are calculated for 
each of the orders. Transaction costs are estimated for each investment institution 
relative to the cost benchmarks. The data is stored. Other objects, advantages and 
features of the invention that may become hereinafter apparent, the nature of the 
invention may be more clearly understood by reference to the following detailed 
description of the invention, the appended claims, and the drawings attached hereto. 
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[0009] According to another embodiment of the present invention, a method for 
ranking a first institutional investor's security transaction cost performance relative to 
transaction costs of other institutional investors is provided. The method includes a 
step of collecting security transaction data for a preselected period of time, for a 
plurality of investment institutions. The transaction data includes identity of securities 
being traded, transaction order sizes, execution prices, momentum and execution 
times. The transaction data is grouped into a plurality of orders, A plurality of cost 
benchmarks are calculated for each of the orders. Transaction costs are estimated 
for each investment institution relative to the cost benchmarks. The first institutional 
investor is ranked against the plurality of investment institutions for at least one of a 
number of factors. 

[0010] According to another embodiment of the present invention, a system is 
provided for ranking a first institutional investor's security transaction cost 
performance relative to transaction costs of other institutional investors. The system 
includes a processing means for collecting security transaction data for a 
preselected period of time, for a plurality of investment institutions. The transaction 
data includes identity of securities being traded, transaction order sizes, execution 
prices, momentum and execution times, grouping said transaction data into a 
plurality of orders. The processing means calculates a plurality of cost benchmarks 
for each of the plurality of orders, estimates transaction costs for each investment 
institution relative to the cost benchmarks, and ranks the first institutional investor 
against the plurality of investment institutions for at least one of a number of factors. 
The system also includes a storing means for receiving data from the processing 
means, storing said data, and making data available to the processing means. 
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[0011] According to another embodiment of the present invention, a system is 
provided for ranking a first institutional investor's security transaction cost 
performance relative to transaction costs of other institutional investors. The system 
includes a processing unit and a database unit. The processing unit is coupled with 
a network and configured to collect security transaction data for a pre-selected 
period of time, for a plurality of investment institutions. The transaction data 
includies identity of securities being traded, transaction order sizes, execution prices, 
momentum and execution times. The processing unit is also configured to group the 
transaction data into a plurality of orders, to calculate a plurality of cost benchmarks 
for each of said plurality of orders, to estimate transaction costs for each order 
relative to the cost benchmarks, and to store the data in a database. The database 
unit is coupled with the processing unit and configured to communicate with the 
processing unit, store data and making data available to the processing unit. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0012] The invention will be described in detail with reference to the following 
drawings, in which like features are represented by common reference numbers and 
in which: 

[0013] Fig. 1 shows the preferred values ranges in codes of categories for cost 
factors according to an embodiment of the present invention; 
[0014] Fig. 2 shows exemplary ranges and values for the cost factors shows in 
Table 1 of Fig. 1; 

[0015] Fig. 3 shows average trading costs for various categories and benchmarks 
of the sample shown in Fig. 2; 
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[0016] Fig. 4 shows order based dollar and equally weighted average trading 
costs for various categories and benchmarks of the sample shown in Fig. 2; 
[0017] Figs. 5-6 are graphs which compare medium cost estimates obtained 
through different regression techniques; 

[0018] Fig. 7 is a graph comparing the 25 th /percentile estimates obtained for 
different regression techniques; 

[0019] Figs. 8-1 0 are graphs which compare estimated and realized cost 
percentile versus trade sizes; 

[0020] Fig. 1 1 is a graph showing estimated realized cost percentile versus 
momentum factor; 

[0021] Figs. 12-14 are graphs which compare the estimated cumulative 
distribution function versus its empirical counterpart; 

[0022] Fig. 15 is a graph comparing the estimated cumulative distribution function 
with its empirical counterpart; 

[0023] Fig. 16 is a block diagram of an exemplary system for estimating 
transaction costs according to an embodiment of the present invention; and 
[0024] Fig. 17 is a screen shot of an exemplary page of an exemplary client 
interface. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 
[0025] The present invention provides a novel system and method for estimating 
financial transaction costs associated with trading securities, and comparing 
institutional performance among peer institutions. Transactional data from various 
peer institutions is collected and analyzed on a periodic basis to create 
comprehensive data relating to transactions, order and executions. The data can be 
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manipulated and presented to a peer institution so that they can benchmark their 
performance against their competitors. Costs are measured by comparing the costs 
of a trade or order by an institution to one or more benchmarks, and then comparing 
costs between institutions for similar stocks under similar situations. 
[0026] The present invention will help institutional investors to manage their 
trading costs more efficiently by ranking the performance of investors relative to 
other peer group participants. The present invention will stimulate institutional 
investors to enhance their analytical environment using the most efficient trading 
execution tools (e.g., POSIT®, TriAct™, ITG SmartServers™, etc.) as well as 
advanced trading analytical products (e.g., TCA, ITG Opt™, ITG ACE™, ResRisk™, 
etc.). 

[0027] For the purpose of describing the present invention, orders are block 
orders of securities requiring the buying or selling of one thousand or more shares of 
at least one security. 

[0028] The present invention includes systems and methods for providing security 
transaction costs. The methodology is described first, followed by exemplary 
embodiments of systems for implementing the same. One skilled in the art will 
readily comprehend that the invention is not limited to the embodiments described 
herein, nor is it limited to specific programming techniques, software or hardware. 
[0029] A framework with two different clusterization approaches is provided: 
single executions and orders. Trades submitted by the same institution with the 
same order identifier, side and stock are assumed to belong to the same order. 
[0030] To build the cost estimates, the transaction cost of each trade or 
order/trading decision are estimated against a number of benchmarks. Though the 
true costs to an institutional trader may include costs such as commission costs, the 
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administrative costs of working an order, as well as the opportunity costs of missed 

trades, the present invention focuses primarily on costs represented by price impact. 

This price impact can be explained as the deviation of the executed price from an 

unperturbed price that would prevail had the trade not occurred. 

[0031] The following benchmarks can be used for estimating transaction costs: 

C T -i - the closing price of the stock on the day prior to the day of execution for 

executions (or on the day prior to the trading decision for orders); 

V T - the volume-weighted average price (VWAP) across all trades during the first 

day of the trade execution for executions (or during the first trading day of the period 

over which the decision was executed for orders); 

Cj+i - the closing price of the stock on the first day after execution for executions 
(or on the first day after the last fill of the decision for orders); 
Ct+2o - the closing price of the stock on the 20th day after execution for executions 
(or on the 20th day after the first trading day of the period over which the decision 
was executed for orders); 

Or - the open price of the stock on the day of execution for executions (or on the 
day of trading decision for orders); and 

M T - the prevailing midquote of the stock prior to execution time for executions (or 
prior to time of trading decision for orders). 

[0032] Benchmark C T . f is described more fully in Perold [12]. Benchmark V T \s 
described in detail by Berkowitz, Logue, and Noser [1]. The benchmark M r is, 
probably, the purest form of unperturbed price that one could choose as opposed to 
C T -i, for example, because it does not depend on other trades that occur between 
closing and time of execution. All three benchmarks (C T -i VYand M T ) are widely 
used in practice both for cost measurement and trading performance evaluation, and 
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will be understood by one of ordinary skill in the art. Although the benchmark VWAP 
is widely used, it is generally not considered to be appropriate for evaluation of large 
order executions, because it can be "gamed" by avoiding trading late in the day if 
prices appear to be worse than the VWAP price. See, for instance, Madhavan [1 1] 
for more details and Lert [1 0] for analysis of differences between various cost 
measurement methods. 

[0033] Transaction costs can be calculated in basis points according to the 
formula: [(P* - PbVPbT <5* 10,000 (Eq. 1); where P* is the actual execution price, 
P b is the benchmark price and 6 is set to 1 or -1 in case of a sell or buy order, 
respectively. Positive trading costs show outperformance, which means that the 
trading decision resulted in profit. 

[0034] To compare transaction costs of one peer institution against the costs of 
other peer institutions under similar circumstances, cost estimates for median and 
other percentiles for each comparison framework are built into a database, or other 
storage means, called a Peer Group Database (PGD). A graphical user interface is 
preferably provided to allow users to view relative peer performance by both 
traditional measures, as well as trade characteristics. More precisely, trading costs of 
executions/orders can be grouped by a number of market and stock-specific cost 
factors, such as type, market capitalization, side market, market, size (represented 
by a percentage of average daily trading volume), and short-term momentum. 
These factors define scenarios. The preferred values and ranges of the exemplary 
cost factors are presented in Fig. 1 . 

[0035] The six cost factors listed above have a significant impact on transaction 
costs, but numerous other factors are contemplated to be used, for instance, broker 
type (alternate broker, full-service broker, research broker, etc.), order type (market, 
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limit, cross), daily volatility, and the inverse of dollar price of a stock (see e.g., 
Werner [17] or Chakravarty, Panchapagesan and Wood [3]). 
[0036] It is important to note that adding too many factors to the PGD may have 
some disadvantages. For example, the product resulting could become more 
complicated, but most importantly, if the amount of transaction data does not 
increase dramatically, the accuracy of estimates will deteriorate as the number of 
observations for each segment becomes insufficient. 

[0037] Referring to Fig. 1 , the factor Type is preferably divided into Growth or 
Value stocks based on the methodology used by Russell 3000® in its indices 
(Russell is a registered trademark of the Frank Russell Company). Micro cap stocks 
are defined as stocks that are neither Growth nor Value stocks and have a market 
capitalization lower than 250 million dollars. Note that, by construction, it may 
happen that a stock belongs to both Growth and Value categories. 
[0038] The factor Market Capitalization classifies stocks into three market 
capitalization groups. For executions, the Market Capitalization is always based on 
the closing stock price C T -i of the day prior to execution. For orders, the Market 
Capitalization is based on the closing stock price C T -u but on the day prior to trading 
decision. The threshold for Small cap stocks is 1 .5 billion dollars. The threshold for 
Mid cap stocks is 1 0 billion dollars. 

[0039] The factor Side comprises of two categories: Buy and Sell. Preferably, no 

distinction is made between normal sells and short sells. 

[0040] For U.S. applications, the factor Market subdivides stocks in two 

categories: Listed and over-the-counter (OTC) stocks. However, for other 

international applications, the Market factor can be subdivided into any number of 

categories. 
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[0041] The Size factor captures the (total) trade size of an execution (order). Size 
is measured relative to the average daily share volume (ADV), which is defined as 
the median daily dollar volume of the latest twenty-one trading days divided by the 
closing stock price of the day prior to execution, for executions, and the day prio rto 
the trading decision for orders. 

[0042] The factor short-term Momentum is measured over the last two days prior 
to execution. Momentum measures the price evolution of a stock within the last two 
trading days as a fraction of absolute price changes. Specifically, 
M=(Q n - Qo) / (Z ( n,i=D IQi - Qi-il Eq. (2) , 

where Q 0 and Q n are the midpoints of the first and last valid primary quotes of the 
most recent two trading days and Qj, 0 < i < n, is the midpoint of the t\h valid primary 
quote occurring immediately prior to each valid primary trade of the most recent two 
trading days. Succinctly, a valid primary quote or trade is a quote or trade of a stock 
that occurred under regular market conditions on the stock's primary exchange. 
[0043] The categories of each factor are preferably restricted to be used with 
other categories as follows: Type categories Value and Growth can be selected only 
with factors Market Capitalization and Side, and Type category Micro cap can be 
selected only with the factor Side. 

[0044] For scenarios that do not use the factors Size and Momentum, empirical 
distributions can be natural estimates for peer cost distributions. However, this is not 
true for the other cases. It is compelling that cost estimates should be consistent and 
close to each other for close values of size and momentum. In other words, the 
ranks of realized costs for two very similar scenarios should not differ very much. 
[0045] The present invention provides robust and consistent peer cost estimates 
for any choice of factors: Market Capitalization, Side, Market, Size and Momentum. 
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Since Type is used in the present invention in conjunction with the factors Market 
Capitalization and/or Side only, it is not considered for simplicity. However, one 
having ordinary skill in the art will understand that factor Type can be easily 
incorporated in the methodology of the present invention. 
[0046] The methodology of the present invention provides estimates for cost 
percentiles for any values of Size and Momentum from [0,«>] and [-1,1], respectively. 
Therefore, the methodology provides much more flexibility than actually needed 
when values of Size and Momentum are subdivided into different groups, and can be 
applied even if the choice of the ranges for Size and Momentum is different from the 
ones shown above. 

[0047] The present invention is described next by way of example. Estimation 
methodology is based on US execution data from January 2002 to December 2002 
submitted by users of TCA. In this sample, the institutional trades represented 91 
firms. All institutions together accounted for 14.6 million trades, 82.7 billion shares 
and 2,067 billion total dollar value. The trades were clusterized into 6.4 million 
orders; an average order consisted of 2.3 executions. 
[0048] Fig. 2 shows descriptive statistics for the entire sample and its sub- 
samples based on the categories for each cost factor. The table presents the 
following information: the number of executions, the number of orders, the number of 
shares traded, the number of stocks (identified by unique cusips) traded and total 
dollar volume. Statistics for factor Type show that the subdivision between Growth 
and Value stocks was quite even. Only a minority of executions and orders belongs 
to the Micro-cap category, although it contains the largest amount of stocks. The 
subdivision with respect to market capitalization seems to be justified ~ executions 
and orders are evenly distributed among the three groups. As shown, the number of 
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large cap stocks was the lowest, but the total dollar value is the highest, Small cap 
stocks are in the majority, while the dollar volumes of buy and sell orders are 
approximately the same. Interestingly, the average size of sell a orders is larger than 
the average size of buy orders. The overwhelming majority of executions and orders 
belong to the smallest size group, i.e. less than or equal to 1% of ADV. This raises 
another challenge on building reasonable and robust cost estimates for the entire 
framework, including large trades and orders. Finally, the statistics for the 
momentum subdivision show that the majority of values of the momentum are close 
to zero. Moreover, negative values for momentum seem to outnumber the positive 
ones, which is expected due to the overall market trend for the period. 
Analysis of average realized transactions 

[0049] Figs. 3-4 present average transaction costs for different factors, 
benchmarks and clusterization types. For each scenario, two average costs are 
provided. The first value is the dollar weighted average trading cost, whereas the 
number in parenthesis indicates the equally weighted average trading cost. Note that 
dollar weighted and equally weighted averages are very different in most of the 
cases. By construction, the dollar weighted average depends mostly on a few large 
trades/orders only. In cases of symmetric distributions, the equally weighted average 
is identical to the median. From this perspective, the equally weighted averages can 
be more appropriate to analyze characteristics of peer group cost distributions. 
[0050] Fig. 3 shows average transaction costs for executions for various 
categories of factors for six preferred benchmarks. Regarding the average trading 
costs for the benchmarks Cr 1, Or and M Tt growth stocks appear to have slightly 
higher average trading costs than Value stocks; by definition, Micro-cap stocks are 
very illiquid and thus encounter much higher average transaction costs. It is apparent 
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from the values in Fig. 3 that trading costs are inversely related to Market 
Capitalization, and listed stocks have lower average costs than OTC stocks, 
supposedly, due to the fact that OTC stocks are, in general, more volatile. It can be 
observed that, on average, sell trades have positive costs while buys appear to have 
negative average costs. This observation holds for benchmarks C7--7 and O t , which is 
very likely due to the overall negative market movement within the selected period. 
This assumption is confirmed by the reversed signs of average costs for sells and 
buys for post trade benchmarks C T +t and C t+2 q- As expected, average trading cost 
decreases as trade size increases. No specific pattern could be found for the 
average trading costs in different momentum categories. 
[0051] For benchmark V T , it is observed that most of the average costs are 
concentrated around zero for all categories that have been studied. The highest 
absolute value of average costs is 17 b.p. Average costs for Growth and Value 
stocks are close, while costs for Micro cap stocks are significantly negative. Similarly 
to the previous benchmarks, average trading costs seem to be inversely related to 
market capitalization and OTC stocks appear to have higher average costs than 
Listed stocks. However, in contrast to the pre-trade benchmarks, there is little 
difference between average costs for buys and sells (at least for the dollar weighted 
averages), which is likely due to the fact that, by construction, the VWAP benchmark 
is set for the day and is not affected by price movement within each day. Average 
cost behavior for Size and Momentum factors for V T is similar to the case of pre- 
trade benchmarks. 

[0052] The post-trade benchmarks C T +i and C T+2 o yield quite different results. 
Benchmark Ct+2o provides average costs that fluctuate substantially, for example, 
both dollar and equally weighted average costs have inverse signs for the same 
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categories in some cases. Basically, the benchmark C T +2o does not seem to indicate 
any meaningful measure for price impact. Benchmark C T+1 provides average costs 
that have the reversed behavior of the pre-trade benchmark C T -i- Costs overall are 
mostly positive, which indicates that on average, peer institutions have strong 
performance with respect to this benchmark. Micro cap stocks have the highest 
positive costs and executions of OTC stocks outperform those of Listed stocks. 
[0053] The analysis shows that average realized transaction costs of the 
exemplary data set are in line with empirical results presented by other researchers 
(see, for instance, Chakravarty, Panchapagesan and Wood [3]). The results strongly 
confirm that measuring costs with respect to different benchmarks affects 
performance evaluation significantly. In light of this fact, it seems to be a challenge to 
build a methodology that can be efficiently applied for all benchmarks discussed 
above. 

[0054] Fig. 4 displays analogous results for orders. 

[0055] Peer cost percentiles can be estimated for all benchmarks, clusterization 
types and possible choices of scenarios, assuming that at least one of the factors 
Size and Momentum has been selected. More precisely, the main result is to derive 
estimates of cost percentiles: 

Xj= COSfpercentileMart<etCap=YI 1 Side=Y2 I Maf1<et=Y3 1 Size=Y4,Momentum=Y5(i)» Eq. (3) 

[0056] where y = (y 1, y2, y3, y4, y5) are arbitrary values for factors Market 
Capitalization, Side, Market, Size and Momentum, i C [0,100], and costs are 
measured relative to one of the six benchmarks discussed above. 
[0057] Before estimating X, in eq. (3), one must note that, first, while the factors 
Market Capitalization, Side and Market are discrete, Size and Momentum can have 
any values from [0,~] and [-1,1], respectively. Consequently, Eq. (3) consists of an 
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infinite number of functions and thus, an infinite number of estimates have to be 
derived. Second, a pure empirical approach might not be practical in all cases. 
Subdividing factors Size and Momentum into different groups and computing the 
empirical distribution for each scenario may lead to inconsistency and instability. As 
a result, performance of costs realized from two very similar scenarios may be 
ranked very differently, which may be confusing for users. Third, it is preferred to 
have a methodology that provides robust estimates and that works for both 
clusterization types and all six benchmarks C T -i, V T , C T +i, C T +2o, O r and M T . This 
requirement is important since various benchmarks (for instance, Wand C T -i) have 
very different properties. 

[0058] In provisional application number 60/464,962, an ordinary least squares 
(OLS) methods method is described for providing estimates. The present invention 
not focus on the mean or median only, but also provides estimates for the 25th, 40th, 
60th and 75 th costs percentiles in addition to the median. Instead of regressing all 
the cost percentiles in the comparison framework directly on the (total) trade size 
and momentum values, the present invention subdivides the comparison framework 
into different groups depending on the Momentum and Size of the executions 
(orders). Then, for each group, the 25th, 40th, 50th (median), 60th and 75 th cost 
percentiles, are determined, as well as the equally weighted average values of 
momentum and (total) trade size. 

[0059] Similar to the simple OLS approach, based on research conducted, all five 
percentiles are assumed to depend linearly on functions fand g of size and 
momentum, or, specifically, 

[0060] X t = a\ + Pi f(S) + Yi 9(M) + e if i = 25, 40, 50, 60 or 75. Eq. (4) 
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[0061] Moreover, based on empirical research, it is assumed that f is positive, 
monotonely increasing, f(0) = 0, and g is either 

g(x) = x or g(x) = Ixr, for some v > 0. 
[0062] A possible choice for f is f(x) =x", for x > 0 and some /y > 0. 
[0063] In order to have a rough estimate for the whole peer cost distribution of a 
scenario, the percentiles between 25 and 75 can be computed by linear 
interpolation. Since transaction cost distributions are heavy-tailed, percentiles below 
25 and above 75 are derived assuming Pareto type of distributions. 
[0064] Different regression estimation techniques can be chosen to estimate the 
regression parameters (ot , $ , y ) in Eq. (4) by regressing the cost percentiles (i) on 
average values of momentum and size. Groups without sufficient number of 
observations are preferably skipped from the regression in order to reduce noise as 
much as possible and ensure stability of the estimates. The present invention 
focuses on the following three regression techniques: (a) ordinary least squares 
(OLS), (b) weighted least squares (WLS) with respect to OLS residuals (WLS1), and 
(c) WLS with respect to observations in each subdivision (WLS2). 
[0065] The WLS1 approach is an enhancement of the OLS approach and 
comprises two steps: first, OLS regression is conducted and the residuals of the 
regression are determined; and second, the parameters are reestimated by 
weighting the observations with the inverse of their squared residuals. In order to 
avoid abnormal weighting, inverses of the squared residuals are truncated by the 
value (Z",,,^ )~\ 

[0066] Estimates become more robust due to the weighting. Moreover, based on 
research, squared residuals are generally the highest for large groups with large 
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trade and order sizes. Weighting by the residuals increases the importance of cost 
percentiles for groups with smaller sizes. This is desirable since executions (and 
orders) with small (total) trade sizes are in the majority as pointed out above. 
[0067] Method WLS2 weights the importance of each group in a different way. 
Instead of weighting by the OLS residuals, WLS2 takes into consideration the 
amount of observed data in each subdivision and thus weights by the number of 
observations in each group. The problem with this method is that the number of 
observations might vary dramatically from group to group according to the data. The 
approach might yield reasonable results for some scenarios (usually for small trade 
sizes and momentum values close to zero) but provide bad estimates overall. 
[0068] The present invention has the advantage that it provides more information 
about the whole peer cost distribution. Moreover, it filters out outliers in a natural way 
by taking medians (and other percentiles) in each group. However, it should be 
noted that there is no theoretical justification how to subdivide groups optimally, and 
regressing percentiles on the average size and momentum is only an approximation. 
[0069] Figures 5-7 provide comparison of the results for these three regression 
techniques. In each figure, the empirical percentiles are annotated by points. 
[0070] Fig. 5 compares median cost estimates obtained by OLS, WLS1 and LS2 
with empirical median costs. The costs denote the empirical medians. The solid line 
indicates the estimated median costs using the regression techniques WLS1 . The 
two dotted lines show median cost estimates for OLS and WLS2. All estimates have 
been derived using regression Eq. (3) for all executions in our data sample with f(x) - 
xand g(x) = 0. Costs are measured relative to benchmark Cr-*. Empirical 
percentiles have been regressed on average size and momentum values, i.e. f(x) = x 
and g(x). The chart illustrates that all regression methods provide good estimates. 
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[0071] Fig. 6 compares median cost estimates obtained by OLS, WLS1 and 
WLS2 with empirical median costs. The dots denote the empirical medians. The 
solid line indicates the estimated median costs using the regression technique 
WLS1 . The two dotted lines show median cost estimates for OLS and WLS2. All 
estimates have been derived using regression equation (8) for all executions of 
Large cap stocks in our data sample with f(x) = x and g(x) = 0. Costs are measured 
relative to benchmark Ct-i. Instead of taking all executions into account, estimates 
have been derived for executions for Large cap stocks only. The functions f and g 
have been chosen linear again. Median cost estimates using OLS and WLS1 still do 
not differ considerably (WLS1 seems to yield slightly better results), however method 
WLS2 provides unreasonable estimates for large trade sizes. 
[0072] Fig. 7 compares 25 th -percentile estimates obtained by OLS, WLS1 and 
WLS2 with empirical 25 th -percentiles of costs. The dots denote the empirical 25 th - 
percentiles of costs. The solid line indicates the estimated 24 th -percentiie using the 
regression technique WLS1. The two dotted lines show 25 th -percentile estimates for 
OLS and WLS2. All estimates have been derived using regression equation (8) for 
all executions in our data sample, f and g have been selected according to equation 
(14). Costs are measured relative to benchmark M T . Fig. 7 shows 25th-percentile 
estimates for executions and benchmark MT using all data; f and g have been 
selected according to equation (7) below. By construction, WLS2 yields best results 
for small trade sizes, but underperforms the two other techniques when trade sizes 
increase. 

[0073] Figs. 5-7 are typical examples of the overall performance of the techniques 
of the present invention. WLS1 is the most appropriate method for estimation of the 
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five cost percentiles overall and provides consistent and robust estimates for all 
groups, for both executions and orders, and all benchmarks. 
Regression constraints 

[0074] Special attention should be paid to the fact that, without assuming any 
constraints on the regression parameters: a it and y h i = 25, 40, 50, 60 and 75, it 
could occur that for some pair (S, M), 

[0075] X i = a i + j3 i f(S) + r i g(M)<a j +/3jf(S) + rjg(M) = X j , for i > j Eq. (5), 
[0076] which is counterintuitive. 

[0077] To avoid such situations, constraints have to be assigned to the regression 
parameters. The constraints depend on the choice of benchmark and of function g. 
[0078] Accordingly, there are three restrictions for each scenario, benchmark and 
clusterization type. The first constraint suggests that for all cases, condition (5) 
should not hold for (S, M) = (0,0). In other words, we assume that a, > a; for / > j. 
[0079] The second restriction takes into consideration that dispersion of costs 
should increase or decrease as size increases depending on the benchmark and 
clusterization type. Precisely, for /> j, 

p,< $for benchmark VVand clusterization type "executions"; 

P, > J8j otherwise. 

[0080] The last constraint depends on the choice of the function g and on the type 
of benchmark. Typically, it is a technical condition on the parameters y/that ensures 
that (10) doesn't happen. 

[0081] Finally, if any of these constraints is violated, the regression parameters 
(aufii, yi) are adjusted relative to the median are (a 50 , pea, Yso). This approach 
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guarantees that the medians, as the most important percentile estimates, have no 
regression constraints, and thus, remain unaffected by possible adjustments. 
Selection of f and g 

[0082] For each benchmark and clusterization type, several functions of f and g 
are chosen in regression Eq. (4). The linear functions f(x) = x and g(x) = x provide 
the good results for all benchmarks, except for M T . Performance was measured via 
the average value of F? for regressions and the number of adjustments that had to 
be applied due to the regression constraints. Average R 2 of all possible scenarios 
was around 0.55 for the test set, and parameters had to be adjusted in 
approximately 30% of cases. The methodology had the best performance for 
benchmark Cj +2 o and executions with average F? = 0.62, and the worst performance 
for the benchmark Mr and executions with average P? = 0.45. It is assumed that the 
good performance for Cj +2 o has the following explanation. As already mentioned 
above, benchmark C7+20 is just a measure for general price movement and noise in 
the 20 day period. From this point of view, empirical cost percentiles for C T+ zo might 
depend on the underlying trades or orders, very little, and thus, the dependence on 
momentum and size values of the stocks traded will be weak as well. As a 
consequence, $ and y\ in Eq- ( 4 ) can be set to 0 so that Eq. (4) is transformed into 
[0083] Xj = ai + ei, i = 25, 40, 50, 60 or 75 Eq. (6). 

[0084] The poor performance of M T can be explained by the completely different 
behavior of its cost percentiles. The prevailing midquote benchmark is, probably, the 
purest benchmark that can mimic the unperturbed price. For small trade sizes, 
execution prices are naturally bounded by the bid and ask quotes of a stock and 
thus, by definition, costs with respect to the prevailing midquotes are bounded as 
well. As a result, all five cost percentiles must lie very closely to each other, which, 
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unfortunately, results in the violation of the regression constraints. Through empirical 
studies, it was determined that the functions 

[0085] f(x) = fi(f 2 (x)) andg(x) =/x/ 3M , Eq.(7) 

[0086] where 

„ x 1/10 ^ » f x 4 /0.02 3 , x< 0.02 
[0087] f\(x) = x 1/10 and f 2 (x) = J x x >o.02 Ec l- ( 8 ) 

[0088] in regression Eq. (4) for benchmark Mr yield the most satisfactory results. 
[0089] The function f 2 transforms sizes of less than 2% of ADV into even smaller 
values. The transformation has the desired effect that percentile cost estimates of 
small trade sizes do not differ significantly. f 7 and g model the overall non-linear 
behavior of X, in the variables S and M, respectively. 
[0090] Figs. 8-1 0 illustrate typical plots for estimated and realized, cost 
percentiles versus trade sizes for the benchmarks C T +2o, V T and M T . Fig. 8 shows 
estimated and realized cost percentiles versus trade sizes. The estimates are based 
on all executions that had momentum values within the rage (-0.02, 0.02). All 
estimates have been derived using regression technique WLS1 . f and g have been 
selected as f(x) = xand g(x) = x. Costs are measured relative to benchmark C T + 2 o. 
[0091] Fig. 9 displays estimated and realized cost percentiles versus trade sizes. 
The estimates are based on all executions of Large cap stocks that had momentum 
values within the range (-0.02, 0.02). All estimates have been derived using 
regression technique WLS1 . f and g have been selected as f(x) = xand g(x) - x. 
Costs are measured relative to benchmark V T . 

[0092] Fig. 9 shows estimated and realized cost percentiles versus trade sizes. 
The estimates are based on all executions that had momentum values within the 
range (-0.02, 0.02). All estimates have been derived using regression technique 
WLS1. fand g have been selected according to equation (14). Costs are measured 
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relative to benchmark M T . In Figs. 8 and 10, the estimates are based on all 
executions with Momentum values within the range (-0.02,0.02). 
[0093] Fig. 1 0 contains cost percentiles for all Large cap stocks and executions 
with Momentum values within the range (-0.02,0.02). As discussed above, the 
Figures show different behavior of cost percentiles for various benchmarks. Note that 
the scale on the y-axis varies considerably from benchmark to benchmark. Peer cost 
distributions for benchmark C T+20 are generally flat and heavy-tailed, and the form of 
the distribution does not change drastically as the trade size increases. This is 
different for the benchmarks Wand M T . In both cases, the standard deviations of 
peer cost distributions change considerably as trade sizes increase (for M T it 
increases, for V T it decreases). 

[0094] Fig. 1 1 displays estimated and realized cost percentiles versus momentum 
for the benchmark M T . Fig. 1 1 illustrates that the cost percentiles depend on the 
variable Momentum non-linear fashion. Executions with high absolute values for 
short-term momentum appear to be more costly. The estimates are based on all 
executions. All estimates have been derived using regression technique WLS1 . f 
and g have been selected as f(x) = x and g(x) = I x| 3/4 . 
Modeling the tails of peer cost distributions 

[0095] It is well-known that empirical cost distributions are generally asymmetric 
and heavy-tailed. The asymmetry has been incorporated in the two-step 
methodology of the present invention by using five independent regression equations 
for the estimation of the 25th-, 40th-, 50th-, 60th- and 75th -percentiles. The heavy 
tails of the peer cost distributions can be modeled by Pareto distributions that are 
commonly used in extreme value theory (see e.g. Embrechts, Klueppelberg and 
Mikosch [6]). The modeling of the left tail of a peer cost distribution can be 
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represented by the function Fonly. The methodology for the right tail can be 
modeled in a similar way. 

[0096] Assuming a Pareto-type distribution tail behavior, the left tail of Fis 
modeled as 

F(x) = c (X 25 + z -xf, for x <X 25 , Eq. (9) 

where c, zand K, are positive constants determined from conditions: 

(i) 0.25 = F(X 25 ), 

(ii) 0. 1 5(X 40 -X 25 ) = P (X 25 ), and 

(iii) 0.0001 =F( -10,000). 

[0097] Condition (i) follows directly from the definition of X 25 and Eq.. (9), 
condition (ii) guarantees that the peer cost distribution function F is smooth in X 25 , 
and condition (iii) assumes that all peer cost distributions must have virtually finite 
ranges. Selection of the function (9) does not assume that the function can be equal 
to 0, but the condition (iii) makes costs below -10,000 basis points practically 
impossible. 

[0098] Conditions (i), (ii) and (iii) define the left tail of the distribution function F 
uniquely and percentiles Xi, X 24 can be derived. 

[0099] Since actual transaction costs are extremely noisy and heavy-tailed, a 
robust method to build peer group cost distributions is required. The present 
invention provides a methodology that estimates peer cost percentiles for six 
different benchmarks, two different clusterization types and all possible choices of 
scenarios. In the present invention, trading costs can be grouped by the factors 
Type, Market Capitalization, Side, Market, Size and Short-term Momentum. While 
the first four factors have discrete values as input, it may be assumed that the factors 
Size and Momentum can have any values between [0, °°] and [-1,1], respectively. 
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[00100] The two-step approach provides smooth and robust estimates for all 
scenarios corresponding to any values of numerical factors Size and Momentum. If 
Size and Momentum are subdivided into discrete groups S h S m and M h M n ; 
m, n>1, respectively, the procedure for estimating peer cost distributions remains 
similar to the continuous case. For any partition (S jf M k ) 1 <j<m and 1 <k<m, 
compute average Size and Momentum (S, M) for the partition and determine the five 
percentiles X 2 s, », X 75 by inserting (S, M) in Eq. (4). All other percentile 
computations are identical to the continuous case. 

[00101] The present invention filters out outliers in a natural way. Moreover, in 
contrast to a simple OLS regression, the two-step approach yields percentile 
estimates for the whole peer cost distribution. There is no theoretical justification on 
how to subdivide Momentum and Size groups in the first step of our methodology 
optimally. Regressing percentiles on the average Size and Momentum is an 
approximation only. 

[00102] To measure performance of the two-step approach for an arbitrary 
scenario y- {y u y 2 , y3, y* ys) for Market Capitalization, Side, Market, Size and 
Momentum one can compare the theoretical distributions with the corresponding 
empirical peer cost distributions (for y 4 and y 5 one can choose intervals [y 4 -Ay 4 , y 4 + 
Ay 4 ] and [y 5 -Ay 5 , y 5 + Ay 5 ]). Comparing the theoretical with the empirical 
distributions provides an idea on how well the methodology works. Empirical studies 
performed by the present inventors have shown that in most cases estimated peer 
cost distributions are very close to the actual distributions. Percentile estimates of 
scenarios with very flat distributions appei3r to be less reliable. In particular, peer 
cost estimates for benchmark Cj +2 o might differ significantly from the empirical peer 
cost characteristics. 
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[00103] Figs. 12-1 5 illustrate four examples of theoretical and empirical 
cumulative peer cost distributions for different scenarios and benchmarks. The 
scenarios are abbreviated by X_Y_Z, where the character X stands for the 
corresponding category Market Capitalization, Y stands for the category Side and Z 
represents the category Market, assuming codes presented in Fig. 1 . The solid 
black line denotes the empirical cumulative distribution function in each figure. All 
estimated cumulative distribution functions have been derived using the two-step 
approach with WLSI. The functions f and g in (Eg. 4) have been selected as 
indicated in above. The selected Size and Momentum values are specified by two 
intervals. Estimated cost percentiles have been built using the point in the center of 
each of these intervals. 

[001 04] Fig. 1 2 compares the estimated cumulative distribution function with 
the empirical counterpart. The distributions have been built using all executions that 
belong to Listed stocks (scenario A_A_N), have 40-50% ADV trade sizes and values 
for short-term momentum between -0.05 and -0.03. Cost have been measured 
relative to Ct-i- The estimated percentiles have been derived using the two-step 
approach with WLS1 . The functions f and g in Eq. (4) have been chosen as. f(x) = x 
and g(x) = x. The distributions have been determined using all executions that 
belong to Listed stocks, have 40-50% of ADV trade sizesand values for short-term 
momentum between -0.05 and -0.03. The figure shows that the distributions are 
concentrated around the median. Some discrepancies can be observed around the 
25th- and 75th- percentiles. The discrepancies might have appeared because the 
constraints in Eq. (4) haven't been satisfied and thus the parameters (a 25 , fa, y 2 s) 
and (a 75 , frs, Y75) had to be adjusted. Another, simpler explanation might be that the 
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scenario has a restricted number of empirical observations only. As a consequence, 
the empirical cumulative distribution might be not robust enough for comparison. 
[00105] Fig. 1 3 presents the comparison for all executions that belong to Mid 
cap, Listed stocks with trade sizes between 0.4% and 0.6% of ADV and short-term 
momentum values around O. Note that this is a scenario to which a lot of 
observations belong. Therefore, it can be expected that the empirical cumulative 
distribution function is robust. The plot shows that both cumulative distribution 
functions almost coincide. A similar good performance can be observed in Fig. 13. In 
this figure, both distributions have been created using benchmark Cm and scenario 
S_S_/V with y 4 = 0.14 and y 5 = 0, i.e. sell trades belonging to Small cap stocks with 
trade sizes around 14% of ADV and short-term momentum around 0. Fig. 15 
illustrates the comparison for benchmark C T +2o and scenario S_A_Q with trade sizes 
around 1 % of ADV and momentum values around -0.1 . The chart demonstrates 
again an extraordinarily fit for percentiles between the 25th- and 75th-percentile 
range. However, in contrast to the other figures, the empirical and estimated 
cumulative distribution functions do not coincide in the tails. A possible reason might 
be that the assumptions made for the tail behavior, discussed above, are not always 
applicable for benchmark Cj+ 2 o. In particular, costs below -10,000 and above 10,000 
b.p., respectively, may regularly occur and thus the threshold value 0.0001 is, 
probably, too low. 

[00106] The presented charts can be viewed as a representative sample to 
assess performance of the two-step approach. The methodology provides consistent 
cost percentile estimates for the selection of the benchmarks, clusterization types 
and scenarios. By construction, estimates of median are the most accurate while 
percentiles for tails are based on modeling assumptions and, therefore, can 
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potentially differ from actual percentiles. One could suggest to estimate more 
percentiles in equation (4). However, increasing the number of percentiles that are 
estimated by a regression equation has a big drawback. The more regressions one 
adds to equation (4) the more adjustments and estimation errors can occur. We 
believe that the current method provide; the most accurate percentile estimates 
around the center of the distribution as well as good percentile estimates overall. 
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[00125] One skilled in the art will understand that the above methodology may 
be implemented in any number of ways. For example, referring to Fig. 1 6, a system 
100 for estimating transactions costs for peer institutions can include a processor 
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unit 102 and a PGD database 104, coupled with a network 106, such as the Internet. 
Institutional traders use various client systems for performing securities transactions. 
For example, a client interface 1 08 may use a trader client 1 08 to trade on NASDAQ 
200. 

[00126] Tools can be used to collect trade data. For example, ITG markets a 
product called TCA™ (transaction cost analysis), which can collect and analyze 
transaction data. This tool may be used to collect transaction data and download the 
data to PGD database 104. As transactional data is collected, the benchmarks may 
be calculated in real-time as the data, or data can be collected later by batch 
processing. The data may be separated or organized according to cost factors, such 
as Size, Type, etc. 

[00127] Periodically, such as once a month, or once a week, the two-step 
statistical analysis described above is performed on the transaction data to generate 
cost estimates for each institution for each scenario. First, data is grouped according 
to size and momentum and, second, each percentile (i) is regressed using linear 
interpolation, and other techniques described above. The data can be presented to 
a user in any number of ways. 

[00128] Accordingly, processor unit 1 02 may be appropriately outfitted with 
software and hardware to perform the processes describe above, and configured to 
communicate with database 104 as necessary. One skilled in the art will understand 
that the system may be programmed using a number of conventional programming 
techniques and may be implemented in a number of configurations, including 
centralized or distributed architectures. 

[00129] Peer investment institutions may access the PGD via a client interface. 
An exemplary display is shown in Fig. 17. Display 300 shows a peer institutions 
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performance for a particular benchmark relative to the entire cost distribution. The X 
axis is Size, by percentile, and the Y-axis is cost related to the benchmark, in basis 
points (bps). The cost distribution can be represent by bars, or any other graphical 
fashion, to show the peer institutions estimated costs with reference to then entire 
PGD. For example, graph 300 shows the current peers performance as being 
relatively good, relative to the entire PGD, for transaction sizes of less than 1%, 1% 
to 5%, 5% to 1 0%, 25% to 50% and for transaction sizes over 50%. This particular 
institution performs poorly for transaction sizes of 10% to 25%. This is merely an 
example of one way that meaningful results can be presented graphically, and one 
having ordinary skill in the art will readily recognize that once costs are estimated for 
a particular institution, for all benchmarks, groups and percentiles, there are many 
ways to present the results, either graphically or otherwise, in a meaningful fashion. 
[00130] Thus, the present invention has been fully described with reference to 
the drawing figures. Although the invention has been described based upon these 
preferred embodiments, it would be apparent to those of skilled in the art that certain 
modifications, variations, and alternative constructions would be apparent, while 
remaining within the spirit and scope of the invention. In order to determine the 
metes and bounds of the invention, therefore, reference should be made to the 
appended claims. 
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